Skip to main content

How to aggregate data into family level

Individuals can be linked to a family number that can be used to aggregate data on family level. Individuals belonging to the same family are registered with the same family number which consists of the personal id number of the eldest person in the family.

In the example below, an individual level dataset is first created. It is then filtered down to persons in families consisting of married couples with small children (codevalue 2.1.1). Next, demographical information is imported into the dataset.

In order to create data on family level income, one must first create a new dataset for this purpose (variables of different unit types can not be mixed together in a single dataset). A variable measuring work related income on individual level is then imported into the new dataset, before the collapse(sum) command is used to sum income into family level (by(famnr)). This results in a dataset with family as unit type.

Finally, family income is merged into the individual level dataset using the merge command.

 //Connect to datastore
require no.ssb.fdb:30 as db

//Create an individual level dataset consisting of persons in families defined by married couples with small children
create-dataset persondata
import db/BEFOLKNING_REGSTAT_FAMTYP 2021-01-01 as famtype
tabulate famtype
keep if famtype == '2.1.1'

//Add demographical information
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
generate age = 2021 - int(birthdate/100)

import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as municipality
generate county = substr(municipality, 1, 2)

import db/BEFOLKNING_BARN_I_HUSH 2021-01-01 as children


//Create dataset for generating family level income (unit type = family)
create-dataset familydata
import db/BEFOLKNING_REGSTAT_FAMNR 2021-01-01 as famnr
import db/INNTEKT_WYRKINNT 2021-01-01 as workincome
collapse (sum) workincome, by(famnr)
rename workincome familyincome

//Merge family income into individual level dataset (unit type = individuals)
merge familyincome into persondata on PERSONID_1

//Generate family level statistics. The family number consists of the personal id of the eldest person in the family, so by removing individuals with missing family level income, the dataset now has unit type = family. All individual information will be assosiated with the eldest person in the family
use persondata
drop if sysmiss(familyincome)

rename age age_oldest
rename gender gender_oldest

define-labels countytxt '03' Oslo '11' Rogaland '15' 'Møre og Romsdal' '18' Nordland '30' Viken '34' Innlandet '38' 'Vestfold og Telemark' '42' Agder '46' Vestland '50' Trøndelag '54' 'Troms og Finnmark' '21' Spitsbergen '25' 'Education abroad' '99' Unknown
assign-labels county countytxt

tabulate county

histogram age_oldest, discrete
histogram children, discrete percent

tabulate children
tabulate children, cellpct
tabulate children gender_oldest

summarize familyincome
barchart (mean) familyincome, by(county)
barchart (mean) familyincome, by(children)
histogram familyincome, freq
histogram familyincome, by(children) percent